********************************************************************************
******************** HILDA MERGE CODE - households only ************************
********************************************************************************
************************************ Notes *************************************
/* Title: HILDA household merge code										  */
/* Author: Benjamin Beckers													  */
/* Contact: beckersb@rba.gov.au												  */
/* Created: 15.10.2017														  */
/* Last edited: 15.11.2017													  */
/* Description: This file loads the unbalanced HILDA individual file created in
"HILDA merge code.do" and returns an unbalanced and balanced household file. 
The following procedure is applied:
1. Determine household heads in first year (and the top-up sample(s)), and 
assign the household head ID and a unique household ID to all members of the 
household. Determining the household head follows the standard tiebreaking 
procedure:
	a) lone person, lone parent, or being in a relationship
	b) highest income earner
	c) eldest person
	d) ID (random)
2. For each following year: identify household splits and mergers, and the 
source of change:
	a) No change to composition of household
	b) "death" of HH head, but remaining members still form a household
	c) "death" of any other member but the household head
	d) Split of a couple
	e) Split of multifamily household with more than two adults (other than 
	   nondependent children)
	f) Merger with a HILDA outsider (birth)
	g) Merger with a HILDA insider (household merger)
Move household IDs forward for as many households as possible following these
rules dependent on the identified change above:
	a) Keep original HH ID, household head remains with the same individual
	b) Keep original HH ID, assign new household head following tiebreaker rules
	c) Keep original HH ID, household head remains with the same individual
	d) Reassign new HH IDs to both, assign new household heads in case of 
	   simultaneous merger with existing household
	e) As d)
	f) Keep original HH ID, household head remains with the same individual
	g) If one household existed for longer in HILDA: keep older HH ID, assign HH 
	   head to head of the older HH. Otherwise: Reassign new HH ID, determine HH
	   head by tiebreaker.
Note on f): Children and dependent students within HILDA who move into a new
household with HILDA outsiders and are still children/dependent students do not 
become household heads and a new household ID is assigned.

Several checks are applied. Only household heads are kept and the unbalanced
panel is saved. Only households that are observed in every wave are kept and the
balanced panel is saved.

IMPORTANT:
Update the first section as directed but do not change any of the code in the 
other sections!	
Run whole code, do not run sections of code or local variables will disappear!
For assistance with the code, write to ER-MircoAnalysisAndData@rba.gov.au.
All changes to this program should be recorded in the version control section at
the bottom and shared with ER-MircoAnalysisAndData@rba.gov.au.				  */

// Minor updates by Alex Ballantyne August/September 2021
// - ad hoc dealing with unstable composition family at "Assert failed position 7"
********************************** End Notes ***********************************

clear all
set more off
capture log close

********************************************************************************
************* 1. Parts to change ***********************************************
********************************************************************************

** Set up local directories // update as necessary


local readdatadir "<place directory here>"
local writedatadir "<place directory here>"
local outputdir "<place directory here>"

** Set name of input file
local unbalancedname "Unbalanced" // the last term is the file name you assigned to the unbalanced data in "HILDA merge code.do"

** Set name of output files
local unbalancedHHname "UnbalancedHH" // the last term is the file name you assign to your new HH unbalanced data file
local balancedHHname "BalancedHH" // the last term is the file name you assign to your new HH balanced data file
local saveinterim = 1 // 1 if intermin unbalanced household files shall be saved in every year of the loop, 0 if not.

log using "`outputdir'hhpanel.log", replace

** Specify which variables to select
local loadsubset = 0 // 1 if only the subset `varselect' shall be loaded, 0 if all variables from individual dataset shall be loaded
* DO NOT CHANGE THIS LINE: These variables are needed to identify households (add lines as necessary)
local varselect xwaveid hhid year hhrih hgage hgsex wave hhpxid hhpers hhtype hhfam hhmove tifdip tifdin edhigh1 hhtup hgdob hhfxid hhmxid hhbmxid hhbfxid hhidpw pntrisk ///
	hgint hhiu hhresp hhrih hhwtrp hhpers mhrealb  hh0_4 hh5_9  hh10_14 hhpcode hslnoth hhid  hhrih hhpid hgint hhpno hslnoth hhadult hhwth  ///  Interview variables 
	hgage hgsex ancob hhstate hhmsr hhsos mrcurr anatsi anengf tcr tcnr tchad tcr04 tcr514 anyoa mhnyr fiprosp hhssa4 /// Demographic variables 
	esempst  es esdtl esbrd jbcasab capune ehtjbyr ehtjb hhura jbmssec nlrealt jbmpg nlan4wk ujlji61 jst hges hwtbani hweqini hwcaini hwccdti rtage /// Employment variables 
    wsfg wsfga wsfna bifiga wsfef wsfei wscei hifdip hifdin /// Wages and salary variables
    bifuga bifip bifin biff /// Business Income
    oifinta oifrnta oifroya oifdiva bifdiva oifinip oifinin oifinf oiinti /// Investment Income  
	oifsupi oifwkci oifppi oifppf  /// Private Pensions
	tifmkip tifmkin /// Regular market income 
    tifpiip tifpiin /// Regular private Income 
    oifchs oifnptr oifohha oifpria  oifpti oifptf oifpnt oifpnta  /// Private Tranfers
    bnfpeni bnfpari bnfalli bnfisi  bnffama  bnfonii bnfnisi bnfisf bnfobi bnfapti bnfaptf  /// Aust. Govt. Public Transfers
    bnfrpi bnffpi oifrsvi oifinha oifnpt oifrpt oifohhl oiflswa oifpria oifoiri oifwfli /// Irregular Income 
    tifeftp tifeftn txtottp txtottn tifditp tifditn  /// Gross Total Income measures (inc. Irr.)
    tifefp tifefn   txtotp txtotn tifdip tifdin  /// Gross Total Income measures (exc. Irr.)
	edhigh edfts edagels edhists edtypes jbmploj jbmplej jbmpgj jbempt jbmo62 jbmo61 jbmcnt jbmtuea jbn jbocct jbmi61 jbmi62 jbmii2 jbemlwk jbmhruc jbprhr jbtprhr  jbptrea jbhrcpr jbhrqf jbhruc ehtujyr hhwtrp /// 
	hsvalue  hsmgowe hsten hsrnti hsmgi hssli hsmgpd hsmguse hsslowe hsslf hsmgf fmagelh hslnoth lshrcom gh1 hsmgfin hhsra /// other variables 
	mhreaas mhreabn mhreaff mhreals mhreahn mhrealb mhrealw mhreanj mhreaob mhreawr mhreawp mhrearb mhreapo mhreaev mhreagh mhreafm mhreahr mhreamb mhreamr mhreaos mhreapf mhreapn mhreasm mhreawt ///moving
	jbmmply lefrd pjmsemp pjoii2 pjljr pjljrea pjsemp jbmhrua jbmhruw jbhruw jbhrua anatsin lnwtrp hhmove lertr hhmove hhmovek jbmwpsz hhtup hhtuh hifboni hifbonf bnfboni bnfonf bnfbon ssfai ssfaf ssfa ///
	jbawmhr hhwthm hhwtrpm  jbmhrv jbmhru jbhrv jbhru  hspown bnfboni rtcomp ///
	hwnwip hwnwin hwassei hwdebti hwfini hwnfii hwtpvi hwsupei hwtbani hsyr hwopdti hsloana hsprice firisk fiprosp fiprbeg fiprbmr fiprbps fiprbwm fiprbuh fiprbfh fiprbwo ///
	hxyalc hxycig hxyeduc hxygroc  hxyhmrn hxymeal hxymvf hxymvr hxypbti hxyphi hxypubt hxyutil ///
	hxyhltp hxyoi hxyphrm hxyteli hxywcf hxyccf hxymcf hxycomp hxyfurn hxyhol hxyncar hxytvav hxyucar ///
	fiprbeg fiprbfh fiprbmr fiprbps fiprbuh fiprbwm fiprbwo firisk jttnum jttrwrk fisave fisavep fisedch fishlpc tifditp tifditn hwassei hwdebti hwfini hwhmdti hwhmvai hwtpdi hwtpvi hwtbani hweqini hwcaini hwccdti ///

/// you can add lines to include any variables you want


*** Open unbalanced data file
if `loadsubset'==1 {
	use `varselect' using "`readdatadir'`unbalancedname'.dta", replace
}
else {
	use "`readdatadir'`unbalancedname'.dta", replace
}

replace fiprbeg = fipreg if wave==10
replace fiprbfh = fiprfh if wave==10
replace fiprbmr = fiprmr if wave==10
replace fiprbps = fiprps if wave==10
replace fiprbuh = fipruh if wave==10
replace fiprbwm = fiprwm if wave==10
replace fiprbwo = fiprwo if wave==10
gen unempdum = 0 if esbrd!=.
replace unempdum = 1 if esbrd==2
bysort hhid wave: egen sumunemp = sum(unempdum)
gen responding = 0
replace responding=1 if esbrd!=. & esbrd>0
bysort hhid wave: egen hhadult2 = sum(responding)
gen propunemp = sumunemp/hhadult2
gen empdum = 0 if esbrd!=.
replace empdum = 1 if esbrd==1
bysort hhid wave: egen sumemp = sum(empdum)
gen propemp = sumemp/hhadult2

bysort hhid wave: egen suminterest = sum(oiinti)

** Variables that require aggregation
* Compute mean at the household level of these variables
local varmean hgage edhigh1 esbrd esdtl jbmcnt jbmploj jbmpgj jbmplej jbmssec jbhrcpr pntrisk firisk fiprosp fiprbeg fiprbmr fiprbps fiprbwm fiprbuh fiprbfh fiprbwo unempdum fiemerf fibfin
* Compute minimum at the household level of these variables (to construct dummies for at least 1 (unemployed) member in household)
local varmin hgage edhigh1 esbrd esdtl hglth jbmcnt pntrisk firisk fiprosp fiprbeg fiprbmr fiprbps fiprbwm fiprbuh fiprbfh fiprbwo jbmploj unempdum fiemerf fibfin 
* Compute maximum at the household level of these variables (to construct dummies for at least 1 (unemployed) member in household)
local varmax hgage edhigh1 esbrd esdtl hges anengf jbmcnt pntrisk firisk fiprosp fiprbeg fiprbmr fiprbps fiprbwm fiprbuh fiprbfh fiprbwo jbmploj unempdum fiemerf fibfin
* Compute range at the household level of these variables (THESE MUST BE AMONG THE MIN - MAX VARIABLES!)
local varrange hgage edhigh1 
* Specify which variables need to be carried forward and aggregated on the individual level 
local varlag tifditp tifditn hwassei hwdebti hwfini hwhmdti hwhmvai hwtpdi hwtpvi
replace edhigh1 = -10 if edhigh==10
replace jbmcnt = -10 if jbmcnt==8
replace jbmploj = -10 if jbmploj>100
replace jbmpgj = -10 if jbmpgj>100

** Sample Selection
* First and last year
sum(year)
local firstyear = r(min) 	// Change to user-specified numeric if k oldest waves are not needed
local lastyear = r(max) 	// Change to user-specified numeric if k newest waves are not needed
disp "Sample: `firstyear' - `lastyear'"
* Panel with gap years
local increment = 1 		// Change to user-specified numeric of step-size between years
/* Add possibility to select irregular sequence of years? */

** Drop (string) variables that are not needed (CHECK!)
drop hhpno hhpid hhrhid hgdob hhfxid hhmxid hhbmxid hhbfxid hhidpw


///DO NOT MAKE ANY CHANGES TO THE FILE BEYOND THIS POINT///

********************************************************************************
************* 2. Create necessary variables for panel construction *************
********************************************************************************
destring xwaveid, replace
destring hhid, gen(hhidnum)

*** Identify top-up households in the year of sample top-up
bysort xwaveid (year) : gen nobs = _n
bysort hhidnum year : egen maxnobs = max(nobs)
gen tuyear = (hhtuh==1 & nobs==1 & maxnobs==1)
drop nobs maxnobs

* Generate individual total disposable regular income
gen inc = tifdip-tifdin

* Generate lagged variables of interest by individual (to be aggregated at the household level - accounting for new HH composition)
/* sort xwaveid year
foreach x of varlag {
	by xwaveid : gen `x'l1 = L1.`x'
} */

* Year of last observations (to identify deaths)
bysort xwaveid : egen fobs = min(year)
bysort xwaveid : egen lobs = max(year)

* Define multi-family HHs: count couples and unrelated adults
bysort year hhidnum : egen ncpls = total(hhrih<5) // Number of individuals in couple relationships
bysort year hhidnum : egen ncpura = total(hhrih<8 | hhrih>10) // Number of couples, single parents, and other related or unrelated adults in HH
replace ncpura = ncpura-ncpls/2 // Number of couples, single parents, and other related or unrelated adults in HH
gen trash = 1 if hhfam==0
bysort year hhidnum : egen nfams1 = total(trash)
bysort year hhidnum : egen nfams2 = max(hhfam)
gen nfams = nfams1 + nfams2 // Number of families in HH
drop trash nfams1 nfams2

* Create original household id to be followed as long as possible
local alphabet = "abcdefghijklmnopqrstuvwxyz"
gen wave2 = substr("`alphabet'",wave,1)
gen yearstr = string(year)
sort year hhidnum xwaveid
gen hhid0 = yearstr+wave2+hhid if (year==`firstyear' | tuyear==1)
order xwaveid hhidnum year hhid0 hhrih inc hgage hgsex


********************************************************************************
******************* 3. Identify household heads in t=0 *************************
********************************************************************************

*** Identify household heads
* Lone persons are automatically household heads
cap assert hhpers==1 if hhrih==12
if _rc!=0 {
	display as error "Assert failed position 1"
	exit 999
}
gen hhhead = 1 if hhrih==12 & (year==`firstyear' | tuyear==1)

*** Identify as many individuals as possible who do not qualify as HH heads
* Child younger than 15 or dependent student
replace hhhead = 0 if hhrih>=8 & hhrih<=9 & (year==`firstyear' | tuyear==1) // DEPENDENT children are never HH heads, NONDEPENDENT children (hhrih==9) may be household heads if they (re)form a household with their (e.g. elderly) parents
* Other family member not part of a couple or parent-child relationship but a couple or parent is present in HH
bysort year hhidnum : egen hhrih0 = min(hhrih)
replace hhhead = 0 if hhrih==11 & hhrih0<8 & (year==`firstyear' | tuyear==1)
* Unrelated individuals to all other HH members but a couple or parent is present in HH
replace hhhead = 0 if hhrih==13 & hhrih0<8 & (year==`firstyear' | tuyear==1)
order xwaveid hhidnum year hhid0 hhhead hhrih inc hgage hgsex

* Individuals that are the only potential household head after excluding all above (Lone parents)
bysort year hhidnum : egen numpothead = total(hhhead==.)
tab hhrih if hhhead==. & numpothead==1 & (year==`firstyear' | tuyear==1) // Check if households with only one potential head are indeed lone persons or parents
replace hhhead = 1 if hhhead==. & numpothead==1 & (year==`firstyear' | tuyear==1)
drop numpothead

** Tiebreaker rules for households with couples, multiple lone parents, or parents and non-dependent children
tab hhrih if hhhead==. & year==`firstyear' // Which households are left
gen pothead = hhhead==. // You are a potential head if you haven't been assigned a value
* Highest income in first year household is observed
bysort year hhidnum pothead : egen incm = max(inc) if (year==`firstyear' | tuyear==1)
sort year hhidnum hhhead inc xwaveid
by year hhidnum : gen inc1 = (_n==_N) if (year==`firstyear' | tuyear==1)
by year hhidnum : replace hhhead = 1 if (inc1==1 & inc!=inc[_n-1] & pothead==1 & (year==`firstyear' | tuyear==1))
by year hhidnum : replace hhhead = 0 if (inc!=incm & pothead==1 & (year==`firstyear' | tuyear==1))
drop inc1 incm
replace pothead = hhhead==.
* Oldest person
bysort year hhidnum pothead : egen hgagem = max(hgage) if (year==`firstyear' | tuyear==1)
sort year hhidnum hhhead hgage xwaveid
by year hhidnum : gen hgage1 = (_n==_N) if (year==`firstyear' | tuyear==1)
by year hhidnum : replace hhhead = 1 if (hgage1==1 & hgage!=hgage[_n-1] & pothead==1 & (year==`firstyear' | tuyear==1))
by year hhidnum : replace hhhead = 0 if (hgage!=hgagem & pothead==1 & (year==`firstyear' | tuyear==1))
drop hgage1 hgagem
replace pothead = hhhead==.
* Xwaveid (random)
sort year hhidnum hhhead xwaveid
by year hhidnum : gen hhobs = (_n==_N) if (year==`firstyear' | tuyear==1)
by year hhidnum : replace hhhead = 1 if (hhobs==1 & pothead==1 & (year==`firstyear' | tuyear==1))
by year hhidnum : replace hhhead = 0 if (hhobs==0 & hhhead==. & (year==`firstyear' | tuyear==1))
drop hhobs
replace pothead = hhhead==.

* Check if each household has been assigned at least one HH head
cap assert !missing(hhhead) if (year==`firstyear' | tuyear==1)
if _rc!=0 {
	display as error "Assert failed position 2"
	exit  999
}

* Check if each household has exactly one household head
by year hhidnum : egen chkhhhead = total(hhhead) if (year==`firstyear' | tuyear==1)
cap assert chkhhhead==1 if (year==`firstyear' | tuyear==1)
if _rc!=0 {
	display as error "Assert failed position 3"
	exit  999
}
drop chkhhhead

* Assign household head ID to all individuals in household
replace hhhead =-1*hhhead
sort year hhidnum hhhead
by year hhidnum hhhead : gen hhheadid = xwaveid
by year hhidnum : replace hhheadid = hhheadid[_n-1] if hhhead>-1
replace hhhead =-1*hhhead

* In which year is current household head last observed in sample? (Necessary to track deaths)
bysort year hhidnum : gen trash = lobs if hhhead==1
by year hhidnum : egen hhlobs = total(trash)
drop trash

* In which year in household first observed?
gen hhfobs = year if (year==`firstyear' | tuyear==1)

*** Generate empty lagged variables for future updating
gen hhidnuml1 = .
gen hhfobsl1 = .
gen hhheadidl1 = .
gen hhheadl1 = .
gen hhpersl1 = .
gen hhrihl1 = .
gen hhpxidl1 = ""
gen hhlobsl1 = .
gen ncpural1 = .
gen ncplsl1 = .
gen nfamsl1 = .

*** Collect information on splits and mergers
gen splits = .
gen splitscpl = .
gen splitsmf = .
gen splitshead = .
gen splitsnonhead = .
gen mergeout = .
gen mergeoutchild = .
gen mergeoutli = .
gen mergeouthi = .
gen mergein = .

if `saveinterim'==1 {
	save "`writedatadir'/UHH`firstyear'.dta", replace
}

********************************************************************************
****************** 4. Identify household changes in t+1 ************************
********************************************************************************
local y = 2001
while `y'<`lastyear' {
	local y = `y'+`increment'
quietly {
	noisily disp "Now running: year `y'"
	* Bring past household characteristics forward
	sort xwaveid year
	by xwaveid : replace hhidnuml1 = hhidnum[_n-1] if year==`y' & year==year[_n-1]+1
	by xwaveid : replace hhfobsl1 = hhfobs[_n-1] if year==`y' & year==year[_n-1]+1
	by xwaveid : replace hhheadidl1 = hhheadid[_n-1] if year==`y' & year==year[_n-1]+1
	by xwaveid : replace hhheadl1 = hhhead[_n-1] if year==`y' & year==year[_n-1]+1
	by xwaveid : replace hhpersl1 = hhpers[_n-1] if year==`y' & year==year[_n-1]+1
	by xwaveid : replace hhrihl1 = hhrih[_n-1] if year==`y' & year==year[_n-1]+1
	by xwaveid : replace hhpxidl1 = hhpxid[_n-1] if year==`y' & year==year[_n-1]+1
	by xwaveid : replace hhlobsl1 = hhlobs[_n-1] if year==`y' & year==year[_n-1]+1
	by xwaveid : replace ncpural1 = ncpura[_n-1] if year==`y' & year==year[_n-1]+1
	by xwaveid : replace ncplsl1 = ncpls[_n-1] if year==`y' & year==year[_n-1]+1
	by xwaveid : replace nfamsl1 = nfams[_n-1] if year==`y' & year==year[_n-1]+1
	sort year hhidnum
	
	****************************************************************************
	************* 4a. Unchanged households *************************************
	****************************************************************************
	*** Check if household composition remained exactly the same (number and ID's of individuals)
	bysort year hhidnum : gen chknum = (hhpers==hhpersl1 & year==`y') // number of individuals does not change
	bysort year hhidnum : egen chknumhh = min(chknum)
	bysort year hhidnum (hhidnuml1): gen chkcomp = (hhidnuml1==hhidnuml1[1]) if year==`y' // composition of individuals does not change
	by year hhidnum : egen chkcomphh = min(chkcomp)
	replace hhhead=hhheadl1 if (chknumhh==1 & chkcomphh==1 & year==`y') // HH head status remains the same for all members of those HHs
	replace hhheadid=hhheadidl1 if (chknumhh==1 & chkcomphh==1 & year==`y')
	* Assign household and household head IDs
	bysort xwaveid (year) : replace hhid0=hhid0[_n-1] if hhhead==1 & year==`y' // Keep original HH ID (missing for new households!)
	replace hhhead =-1*hhhead
	sort year hhidnum hhhead
	by year hhidnum : replace hhid0 = hhid0[_n-1] if hhhead>-1 & hhhead!=.
	replace hhhead =-1*hhhead
	
	
	****************************************************************************
	************* 4b. Identify household splits ********************************
	****************************************************************************
	** Any household split (compositional change or decrease in size)
	bysort year hhidnuml1 (hhidnum) : gen chksplit = ((hhidnum!=hhidnum[1] | hhpers<hhpersl1) & hhidnuml1!=. & year==`y') // Is hhid still the same for all members of previous household?
	bysort year hhidnuml1 : egen hhsplit = max(chksplit) if year==`y' & hhidnuml1!=. // Household split if at least one member is not in hh any more
	replace splits = hhsplit if year==`y'
	drop chksplit
	noisily tab hhsplit // Frequency of household splits (on individual member level)
	** Identify reasons for split (determines if HH ID survives)
	* Household head left household (moved out, died or left sample) [Household ID moves on to remaining members]
	gen exheaddead = (hhlobsl1<year & hhidnuml1!=.)  // Has the former household head died (left sample)?
	bysort year hhidnuml1 : egen hhsplithead = max(exheaddead) if year==`y'
	replace splitshead = hhsplithead if year==`y'
	noisily tab hhsplithead // Frequency of "death" of household head (on individual member level)
	* Couple or multifamily household split up [Every new household gets a new ID]
	* Couple split
	gen trash = (hhsplit==1 & hhrihl1<5 & (hhpxid!=hhpxidl1) & (ncpural1<3 & nfamsl1<=1) & year==`y' & hhidnuml1!=. & hhsplithead!=1)
	bysort year hhidnuml1 : egen hhsplitcpl = max(trash) if year==`y'
	replace splitscpl = hhsplitcpl if year==`y'
	drop trash
	noisily tab hhsplitcpl  // Frequency of couple splits (on individual member level)
	* Multifamily split
	gen trash = (hhsplit==1 & ((ncpural1>2 & ncpural1!=.) | (nfamsl1>1 & nfamsl1!=.) | (ncpural1==2 & ncpural1!=.  & ncplsl1==0)) & year==`y' & hhidnuml1!=. & hhsplithead!=1)
	bysort year hhidnuml1 : egen hhsplitmf = max(trash) if year==`y'
	replace splitsmf = hhsplitmf if year==`y'
	drop trash
	noisily tab hhsplitmf  // Frequency of multi-family household splits (on individual member level)
	sort year hhidnum
	* Other member left household (moved out, died or left sample, but no couple or multifamily hh-split!) [Household ID remains with previous household head, leaving members get new ID]
	gen hhsplitnonhead = hhsplit-hhsplithead-hhsplitcpl-hhsplitmf
	gen trash = (hhsplitnonhead==1 & hhheadl1==0)
	bysort year hhidnum : egen hhsplitnonheadnewhh = min(trash) if hhidnuml1!=.
	replace splitsnonhead = hhsplitnonhead if year==`y'
	drop trash
	noisily tab hhsplitnonhead // Frequency of any other split (on individual member level)
	disp "hhsplitnonhead==-1 for cases when HH head left, but remaining members stayed together and added individual to household"
	
	****************************************************************************
	************* 4c. Identify household mergers *******************************
	****************************************************************************
	* Merger with someone outside of HILDA?
	gen newres = (hhidnuml1==.) // Not in HILDA in PREVIOUS period (cannot be household head, household info is not carried over more than one period)
	bysort year hhidnum : egen hhmergeout = max(newres) if year==`y'
	* Identify which outside merger it is (for statistical purposes only (new child, lowest or highest income))
	* New child
	gen trash1 = newres if year==`y' & (hhrih==8 | hhrih==9)
	bysort year hhidnum : egen hhmergeoutchild = max(trash1)
	* Lower income
	bysort year hhidnum : egen incrank = rank(inc)
	replace incrank = hhpers+1-incrank
	gen trash2 = newres if year==`y' & incrank>1 & (hhrih!=8 & hhrih!=9)
	bysort year hhidnum : egen hhmergeoutli = max(trash2)
	* Higher income
	gen trash3 = newres if year==`y' & incrank==1 & (hhrih!=8 & hhrih!=9)
	bysort year hhidnum : egen hhmergeouthi = max(trash3)
	drop trash*
	replace hhmergeoutchild = 0 if hhmergeoutchild==. & year==`y'
	replace hhmergeoutli = 0 if hhmergeoutli==. & year==`y'
	replace hhmergeouthi = 0 if hhmergeouthi==. & year==`y'
	* Account for immigration of multiple people (high income dominates over low income, dominates over child)
	replace hhmergeoutchild = 0 if hhmergeoutli==1 | hhmergeouthi==1 & year==`y'
	replace hhmergeoutli = 0 if hhmergeouthi==1 & year==`y'
	replace hhmergeoutchild = 0 if tuyear==1
	replace hhmergeoutli = 0 if tuyear==1
	replace hhmergeouthi = 0 if tuyear==1
	* Store results
	replace mergeout = hhmergeout if year==`y'
	replace mergeoutchild = hhmergeoutchild if year==`y'
	replace mergeoutli = hhmergeoutli if year==`y'
	replace mergeouthi = hhmergeouthi if year==`y'
	drop hhmergeoutchild hhmergeoutli hhmergeouthi incrank
	
	* Merger with someone inside of HILDA?
	bysort year hhidnum newres (hhidnuml1): gen chkmergein = (hhidnuml1!=hhidnuml1[1] & hhidnuml1!=. & hhidnuml1[_n-1]!=. & year==`y')
	bysort year hhidnum : egen hhmergein = max(chkmergein) if year==`y'
	replace mergein = hhmergein if year==`y'
	drop chkmergein
	
	****************************************************************************
	************* 5. Determine HH heads and IDs for splits and mergers *********
	****************************************************************************	
	*** No change in HH head and HH ID
	* HH split but head still alive, no couple- or multi-family split OR: merger with someone OUTSIDE of HILDA
	replace hhhead = hhheadl1 if hhhead==. & hhheadl1==1 & ((hhsplit==1 & hhsplithead==0 & hhsplitcpl==0 & hhsplitmf==0) | (hhmergeout==1 & hhmergein==0)) & hhmergein!=1 & year==`y'
	replace hhheadid = xwaveid if hhheadid==. & year==`y' & hhhead==1
	bysort xwaveid (year) : replace hhid0 = hhid0[_n-1] if hhid0=="" & year==`y' & hhhead==1
	bysort year hhidnum : egen totalheads = total(hhhead==1)
	replace hhhead=0 if hhhead==. & totalheads==1 & year==`y'
	replace hhhead = -1*hhhead
	bysort year hhidnum (hhhead): replace hhid0 = hhid0[_n-1] if hhid0=="" & year==`y' & totalheads==1
	bysort year hhidnum (hhhead): replace hhheadid = hhheadid[_n-1] if hhheadid==. & year==`y' & totalheads==1
	replace hhhead = -1*hhhead
	drop totalheads	
	
	*** Identify as many individuals as possible who do not qualify as HH heads
	* New sample members can never be household heads
	replace hhhead=0 if hhidnuml1==. & year==`y'
	* Dummy for CSMs that are hhrih==11|13 in new non-csm household (they will now REMAIN HH HEAD!)
	bysort year hhidnum : egen noncsmhh = total(hhheadl1==.) if year==`y'
	bysort year hhidnum : gen csm1113 = (hhheadl1!=. & (hhrih==11 | hhrih==13)) & noncsmhh>0 & year==`y'
	* Child younger than 15, dependent student, nondependent child
	replace hhhead = 0 if hhhead==. & hhrih>=8 & hhrih<=9 & year==`y' // DEPENDENT children are never HH heads, NONDEPENDENT children may be household heads if they (re)form a household with their (e.g. elderly) parents
	* Other family member not part of a couple or parent-child relationship (unless only CSM in new HH)
	replace hhhead = 0 if hhhead==. & hhheadl1!=1 & hhrih==11 & hhrih0<8 & csm1113!=1 & year==`y'
	* Unrelated individuals to all other HH members but a couple or parent is present in HH (unless only CSM in new HH)
	replace hhhead = 0 if hhhead==. & hhheadl1!=1 & hhrih==13 & hhrih0<8 & csm1113!=1 & year==`y'	
	
	*** Identify obvious household heads
	* Lone persons are automatically household heads
	replace hhhead = 1 if hhhead==. & hhrih == 12 & year==`y'	
	* Individuals that are the only potential household head after excluding all above (Lone parents)
	bysort year hhidnum : egen numpothead = total(hhhead==.)
	replace hhhead = 1 if hhhead==. & numpothead==1 & year==`y'
	replace hhheadid = xwaveid if hhheadid==. & year==`y' & hhhead==1	
	
	
	*** Change in HH head but not in HH ID
	* HH head died / left sample
	bysort year hhidnuml1 : egen uniquenewhead = total(hhhead) if hhsplithead==1 & year==`y'
	bysort xwaveid (year) : replace hhid0 = hhid0[_n-1] if hhhead==1 & hhsplithead==1 & uniquenewhead<=1 & year==`y'
	replace hhhead = -1*hhhead
	bysort year hhidnum (hhhead) : replace hhid0 = hhid0[_n-1] if hhid0=="" & year==`y'
	bysort year hhidnum (hhhead) : replace hhheadid = hhheadid[_n-1] if hhheadid==. & year==`y'
	replace hhhead = -1*hhhead
	
	
	*** Change in both HH head and HH ID
	* Households formed after split
	replace hhid0 = yearstr+wave2+hhid if hhid0=="" & (hhsplitcpl==1 | hhsplitmf==1 | (hhsplitnonheadnewhh==1)) & hhmergein!=1 & year==`y'
	replace hhhead = -1*hhhead
	bysort year hhidnum (hhhead) : replace hhid0=hhid0[_n-1] if hhid0=="" & year==`y'
	bysort year hhidnum (hhhead) : replace hhheadid=hhheadid[_n-1] if hhheadid==. & year==`y'
	replace hhhead = -1*hhhead
	
	
	*** Within HILDA Merger
	* Did one of the households exist for longer? (uniquely)
	replace hhheadl1 = -1*hhheadl1
	bysort year hhidnum (hhfobsl1 hhheadl1) : gen olderhh = _n
	bysort year hhidnum (hhfobsl1 hhheadl1) : gen trash = (hhfobsl1!=hhfobsl1[1] & hhfobsl1!=.) | (hhfobsl1==hhfobsl1[1] & hhfobsl1[_n+1]==.)
	bysort year hhidnum (hhfobsl1 hhheadl1) : egen hhageuniq = max(trash)
	drop trash
	replace hhhead = 1 if hhhead==. & (hhmergein==1 & hhmergeout==0) & hhheadl1==-1 & olderhh==1 & hhageuniq!=0 & year==`y'
	bysort year hhidnum : egen totalheads = total(hhhead==1)
	replace hhhead=0 if hhhead==. & totalheads==1 & year==`y'
	drop totalheads
	replace hhheadl1 = -1*hhheadl1
	drop olderhh
	replace hhheadid = xwaveid if hhheadid==. & year==`y' & hhhead==1
	bysort xwaveid (year) : replace hhid0 = hhid0[_n-1] if hhid0=="" & year==`y' & hhhead==1 & (uniquenewhead<=1 | uniquenewhead==.)
	replace hhhead = -1*hhhead
	bysort year hhidnum (hhhead) : replace hhid0=hhid0[_n-1] if hhid0=="" & year==`y'
	bysort year hhidnum (hhhead) : replace hhheadid = hhheadid[_n-1] if hhheadid==. & year==`y'
	replace hhhead = -1*hhhead
	
	
	*** Apply tiebreaker procedure to determine household heads for undetermined households
	drop pothead
	gen pothead = (hhhead==. & year==`y')
	drop numpothead
	bysort year hhidnum : egen numpothead = total(pothead==1) if year==`y'
	replace hhhead = 1 if hhhead==. & pothead==1 & numpothead==1 & year==`y' // Assign household head status to individual who has been in sample (only potential head)
	replace hhhead = 0 if hhhead==. & pothead==0 & year==`y'
	bysort year hhidnum : egen numlaghead = total(hhheadl1==1) if year==`y' // Assign household head status to individuals that have been household head (tiebreaker will apply for multiple heads)
	replace hhhead = 1 if hhhead==. & hhheadl1==1 & numlaghead==1 & year==`y'
	replace hhhead = 0 if hhhead==. & hhheadl1==0 & numlaghead==1 & year==`y'
	
	** Tiebreaker procedure as above
	replace pothead = hhhead==. & year==`y' // You are a potential head if you haven't been assigned a value
	* Highest income in first year household is observed
	bysort year hhidnum pothead : egen incm = max(inc) if year==`y'
	sort year hhidnum hhhead inc xwaveid
	by year hhidnum : gen inc1 = (_n==_N) if year==`y'
	by year hhidnum : replace hhhead = 1 if (inc1==1 & inc!=inc[_n-1] & pothead==1 & year==`y')
	by year hhidnum : replace hhhead = 0 if (inc!=incm & pothead==1 & year==`y')
	drop inc1 incm
	replace pothead = 0 if hhhead==1 | hhhead==0
	* Oldest person
	bysort year hhidnum pothead : egen hgagem = max(hgage) if year==`y'
	sort year hhidnum hhhead hgage xwaveid
	by year hhidnum : gen hgage1 = (_n==_N) if year==`y'
	by year hhidnum : replace hhhead = 1 if (hgage1==1 & hgage!=hgage[_n-1] & pothead==1 & year==`y')
	by year hhidnum : replace hhhead = 0 if (hgage!=hgagem & pothead==1 & year==`y')
	drop hgage1 hgagem
	replace pothead = 0 if hhhead==1 | hhhead==0
	* Xwaveid (random)
	sort year hhidnum hhhead xwaveid
	by year hhidnum : generate hhobs = (_n==_N) if year==`y'
	by year hhidnum : replace hhhead = 1 if (hhobs==1 & pothead==1 & year==`y')
	by year hhidnum : replace hhhead = 0 if (hhobs==0 & hhhead==. & year==`y')
	drop hhobs
	replace pothead = 0 if hhhead==1 | hhhead==0
	
	*** Remaining households are where dependent child has joined non-CSM household (create new household)
	bysort year hhidnum : egen chkhhhead = total(hhhead) if year==`y'
	replace hhhead = . if chkhhhead==0 & year==`y'
	* Identify as many individuals as possible who do not qualify as HH heads
	replace hhhead = 0 if hhhead==. & hhrih>=8 & hhrih<=9 & year==`y'
	replace hhhead = 0 if hhhead==. & hhrih==11 & hhrih0<8 & year==`y'
	replace hhhead = 0 if hhhead==. & hhrih==13 & hhrih0<8 & year==`y'
	* Individuals that are the only potential household head after excluding all above (Lone parents)
	drop numpothead
	bysort year hhidnum : egen numpothead = total(hhhead==.)
	replace hhhead = 1 if hhhead==. & numpothead==1 & year==`y'
	* Tiebreaker rules for households with couples or multiple lone parents
	replace pothead = hhhead==. if year==`y' // You are a potential head if you haven't been assigned a value
	* Highest income in first year household is observed
	bysort year hhidnum pothead : egen incm = max(inc) if year==`y'
	sort year hhidnum hhhead inc xwaveid
	by year hhidnum : gen inc1 = (_n==_N) if year==`y'
	by year hhidnum : replace hhhead = 1 if (inc1==1 & inc!=inc[_n-1] & pothead==1 & year==`y')
	by year hhidnum : replace hhhead = 0 if (inc!=incm & pothead==1 & year==`y')
	drop inc1 incm
	replace pothead = 0 if hhhead==1 | hhhead==0
	* Oldest person
	bysort year hhidnum pothead : egen hgagem = max(hgage) if year==`y'
	sort year hhidnum hhhead hgage xwaveid
	by year hhidnum : gen hgage1 = (_n==_N) if year==`y'
	by year hhidnum : replace hhhead = 1 if (hgage1==1 & hgage!=hgage[_n-1] & pothead==1 & year==`y')
	by year hhidnum : replace hhhead = 0 if (hgage!=hgagem & pothead==1 & year==`y')
	drop hgage1 hgagem
	replace pothead = 0 if hhhead==1 | hhhead==0
	* Xwaveid (random)
	sort year hhidnum hhhead xwaveid
	by year hhidnum : gen hhobs = (_n==_N) if year==`y'
	by year hhidnum : replace hhhead = 1 if (hhobs==1 & pothead==1 & year==`y')
	by year hhidnum : replace hhhead = 0 if (hhobs==0 & hhhead==. & year==`y')
	drop hhobs
	replace pothead = 0 if hhhead==1 | hhhead==0
	
	
	*** Assign household head ID to all members of household
	replace hhhead =-1*hhhead
	sort year hhidnum hhhead
	by year hhidnum : replace hhheadid = xwaveid if hhhead==-1 & hhheadid==.
	by year hhidnum : replace hhheadid = hhheadid[_n-1] if hhhead>-1 & hhheadid==.
	by year hhidnum : replace hhid0 = hhid0[_n-1] if hhhead>-1 & hhid0==""
	replace hhhead =-1*hhhead
	
	
	*** Assign new household ID to remaining newly formed households
	replace hhid0=yearstr+wave2+hhid if hhid0=="" & year==`y'	
	by year hhidnum : gen chk = (hhid0==hhid0[1]) if year==`y'
	cap assert chk==1 if year==`y'
	if _rc!=0 {
		display as error "Assert failed position 4"
		exit  999
	}
	
	
	*** First and last observation for each household or household head
	* In which year is current household head last observed in sample? (Necessary to track deaths)
	bysort year hhidnum : gen trash = lobs if hhhead==1 & year==`y'
	by year hhidnum : egen trash2 = total(trash)
	replace hhlobs = trash2 if year==`y'
	drop trash*	
	* Capture year of households first appearance
	gen trash = substr(hhid0,1,4)
	destring trash, replace
	replace hhfobs = trash if year==`y'
	drop trash
	
	
	*** FINAL CHECKS TO ASSURE UNIQUENESS OF HOUSEHOLDS
	* Check if no missing and if each household has exactly one household head
	cap assert !missing(hhhead) if year==`y'
	if _rc!=0 {
		display as error "Assert failed position 5"
		exit  999
	}
	capture drop chkhhhead
	by year hhidnum : egen chkhhhead = total(hhhead) if year==`y'
	cap assert chkhhhead==1 if year==`y'
	if _rc!=0 {
		display as error "Assert failed position 6"
		exit  999
	}
	* Check if HHID0 is indeed different for seperate households
	bysort year hhid0 (hhidnum): gen trash = (hhidnum!=hhidnum[1]) if hhid0!="" & year==`y'
	bysort year hhid0 (hhidnum): egen chkhhid0 = max(trash)
	cap assert chkhhid0==0 if year==`y'
	if _rc!=0 {
		display as error "Assert failed position 7"
		if `y'==2018 {
			drop if hhid0=="2014n125951" & (hhrih==1 | hhrih==8) & year==`y'
		}
		else {
			exit  999	
		}
		
	}
	drop trash
	
	
	*** Drop temporary variables
	drop chk* hhsplit* newres hhmerge* csm1113 noncsmhh hhageuniq exheaddead numpothead numlaghead uniquenewhead
	if `saveinterim'==1 {
		save "`writedatadir'/UHH`y'.dta", replace
	}
}	
}
save "`writedatadir'`unbalancedHHname'RP.dta", replace

********************************************************************************
************* 6. Aggregate individual information to household level ***********
********************************************************************************
foreach var of local varmean {
	bysort year hhidnum : egen `var'bar = mean(`var') if (hhrih<8 | hhrih>9) & hgage>17 & `var'>-1
}
foreach var of local varmin {
	bysort year hhidnum : egen `var'min = min(`var') if (hhrih<8 | hhrih>9) & hgage>17 & `var'>-1
}
foreach var of local varmax {
	bysort year hhidnum : egen `var'max = max(`var') if (hhrih<8 | hhrih>9) & hgage>17 & `var'>-1
}
foreach var of local varrange {
	gen `var'rng = `var'max-`var'min if (hhrih<8 | hhrih>9) & hgage>17 & `var'>-1
}
foreach var of local varlag {
	if substr("`var'",1,2)=="hw" {
		bysort xwaveid (year) : gen trash = `var'[_n-1]/hhadult[_n-1]
	} 
	else {
		bysort xwaveid (year) : gen trash = `var'[_n-1]
	}
	bysort year hhidnum : egen `var'l1 = total(trash)
	drop trash
}

********************************************************************************
************* 7. Bridge Gaps in HH *********************************************
********************************************************************************
** Identify households by sum of individual IDs (by year)
bysort hhid0 year : egen sumxwaveidr1 = max(xwaveid)
bysort hhid0 year : egen sumxwaveidr2 = min(xwaveid)
tostring sumxwaveidr1, replace
tostring sumxwaveidr2, replace
gen hhperss = hhpers
tostring hhperss, replace
gen sumxwaveid = hhperss+sumxwaveidr1+sumxwaveidr2
destring sumxwaveid, replace
drop sumxwaveidr*
order sumxwaveid xwaveid
* Assure that individuals with identical sum of xwaveids are indeed in the same HH
bysort sumxwaveid year : egen hhidmax = max(hhidnum)
bysort sumxwaveid year : egen hhidmin = min(hhidnum)
bysort sumxwaveid year : gen hhiddiff = hhidmax-hhidmin
gen hhideq = (hhiddiff==0)
cap assert hhideq==1
if _rc!=0 {
	display as error "Assert failed position 8"
	exit  999
}
drop hhidmax hhidmin hhiddiff hhideq
** Check for gaps in member history
bysort xwaveid (year) : gen gapyears = year-year[_n-1]-1
bysort sumxwaveid year : egen hhgap = max(gapyears)
replace hhgap = 1 if hhgap>1 & hhgap!=.
bysort sumxwaveid year : egen hhgaphead = max(gapyears) if hhhead==1
replace hhgaphead = 1 if hhgaphead>1 & hhgaphead!=.
** Bring forward original household ID and household head status
bysort sumxwaveid xwaveid (year) : gen gaprepl = 1 if year==year[_n-1]+gapyears+1 & hhgaphead==1 & year==hhfobs
bysort sumxwaveid (xwaveid year) : egen hhgaprepl = max(gaprepl)
bysort sumxwaveid xwaveid (year) : replace hhid0 = hhid0[_n-1] if hhgaprepl==1 & hhid0[_n-1]!="" 
bysort sumxwaveid xwaveid (year) : replace hhhead = hhhead[_n-1] if hhgaprepl==1  & hhhead[_n-1]!=.
bysort sumxwaveid year : egen nomergeout = max(gaprepl)
replace mergeout = 0 if nomergeout==1
replace mergeoutchild = 0 if nomergeout==1
replace mergeoutli = 0 if nomergeout==1
replace mergeouthi = 0 if nomergeout==1
bysort hhidnum year : egen hhheadtot = total(hhhead)
cap assert hhheadtot==1
if _rc!=0 {
	display as error "Assert failed position 9"
	exit  999
}
drop hhheadtot sumxwaveid gaprepl


********************************************************************************
************* 7.5 AB EXTRA ************************************************
********************************************************************************
// Flag for additional income earner >$100 in HH; nondep child, oth fam mem, unrelated
g othmem = (hhrih==10 | hhrih==11 | hhrih==13) & hgage>=21 & inc>=100
bysort hhid0 year : egen othearner = max(othmem)

********************************************************************************
************* 8. Summary Output ************************************************
********************************************************************************
*** Count individuals by year
tab year

*** Splits and mergers by individuals
replace mergeout = 0 if tuyear==1
tabstat splitscpl splitsmf splitshead splitsnonhead, by(year) stat(mean)
tabstat splitscpl splitsmf splitshead splitsnonhead, by(year) stat(sum)
tabstat mergein mergeout mergeoutchild mergeoutli mergeouthi, by(year) stat(mean)
tabstat mergein mergeout mergeoutchild mergeoutli mergeouthi, by(year) stat(sum)

** Count frequency of relative income changes that could lead to change in household head
bysort hhidnum year : egen incrank = rank(inc)
replace incrank = hhpers+1-incrank
bysort xwaveid (year) : gen incranklag = incrank[_n-1]
bysort xwaveid (year) : gen incswitch1 = (incrank>1 & incranklag==1 & hhhead==0 & hhhead[_n-1]==1 & year==year[_n-1]+1 & mergein!=1 & hhgap<0) // Was head with highest income last period but not any more
bysort xwaveid (year) : gen incswitch2 = (incrank>1 & incranklag==1 & hhhead==0 & hhhead[_n-1]==1 & hhhead[1]==1 & year==year[_n-1]+1 & mergein!=1 & hhgap<0)	   // Was original head in first period and highest income in last period but not any more
sum incs*
sort hhid0 year

*** Count number of households by year
keep if hhhead==1
sort hhid0 year
tab year

*** Splits and mergers by households
tabstat splitscpl splitsmf splitshead splitsnonhead, by(year) stat(mean)
tabstat splitscpl splitsmf splitshead splitsnonhead, by(year) stat(sum)
tabstat mergein mergeout mergeoutchild mergeoutli mergeouthi, by(year) stat(mean)
tabstat mergein mergeout mergeoutchild mergeoutli mergeouthi, by(year) stat(sum)

*** Count number of observations by household
preserve
bysort hhid0 (year) : gen nobs = _n
bysort hhid0 (year) : gen Nobs = _N
keep if nobs==1
tab Nobs
sum Nobs, det
restore

** First observation by household
preserve
bysort hhid0 (year) : gen nobs = _n
bysort hhid0 (year) : gen Nobs = _N
keep if nobs==1
tab hhfobs
sum hhfobs, det
restore

** Consecutive observations
bysort hhid0 (year) : gen consecobs = (year==(year[_n-1]+1) | year[_n-1]==.)
bysort hhid0 (year) consecobs : gen streak = sum(consecobs)
bysort hhid0 (year) : egen lstreak = max(streak)
preserve
bysort hhid0 (year) : gen nobs = _n
bysort hhid0 (year) : gen Nobs = _N
keep if nobs==1
tab lstreak
sum lstreak, det
restore

** Count number of gaps
bysort hhid0 (year) : gen nonconsecobs = (year!=(year[_n-1]+1) & year[_n-1]!=.)
bysort hhid0 (year) : egen NONCONSECOBS = max(nonconsecobs)
bysort hhid0 (year) : gen gaplength = year-year[_n-1]-1 if year[_n-1]!=.
sum NONCONSECOBS
sum gaplength if gaplength>0 & gaplength!=., det
drop incs* nonconsecobs NONCONSECOBS gaplength

save "`writedatadir'`unbalancedHHname'.dta", replace


********************************************************************************
************* 9. Create balanced panel  ****************************************
********************************************************************************

bysort hhid0 (year) : gen Nobs = _N
egen trash = max(wave)
keep if Nobs==trash
drop Nobs trash
save "`writedatadir'`balancedHHname'.dta", replace

*** Splits and mergers by households
tabstat splitscpl splitsmf splitshead splitsnonhead, by(year)
tabstat mergein mergeout, by(year)


quietly{
********************************************************************************
************ 10. Version control ***********************************************
********************************************************************************
*20171015 Original do-file created, Benjamin Beckers
}
